Elucidating tumor heterogeneity from spatially resolved transcriptomics data by multi-view graph collaborative learning

Spatially resolved transcriptomics (SRT) technology enables us to gain novel insights into tissue architecture and cell development, especially in tumors. However, lacking computational exploitation of biological contexts and multi-view features severely hinders the elucidation of tissue heterogeneity. Here, we propose stMVC, a multi-view graph collaborative-learning model that integrates histology, gene expression, spatial location, and biological contexts in analyzing SRT data by attention. Specifically, stMVC adopting semi-supervised graph attention autoencoder separately learns view-specific representations of histological-similarity-graph or spatial-location-graph, and then simultaneously integrates two-view graphs for robust representations through attention under semi-supervision of biological contexts. stMVC outperforms other tools in detecting tissue structure, inferring trajectory relationships, and denoising on benchmark slices of human cortex. Particularly, stMVC identifies disease-related cell-states and their transition cell-states in breast cancer study, which are further validated by the functional and survival analysis of independent clinical data. Those results demonstrate clinical and prognostic applications from SRT data.


Supplementary Note 1 Evaluation of proportion of labels for model training
stMVC is robust to the proportion of labels for the training, which was evaluated based on 12 slices of the human DLPFC dataset with annotations from the previous study 1 . Specifically, for each slice, we (i) randomly selected the spots with labels, with the proportion ranging from 0.1 to 0.9 by 0.1, thus generating nine label datasets; (ii) randomly selected 70% of spots as the training set, the labels of which from one of nine datasets were used to supervise the training of stMVC, stMVC-M, semi-AE, and the three SGATE-based single-view models; and (iii) for each model, predicted the cell clusters by the Louvain algorithm, and assessed the influence of the proportion of labels on the model training via clustering accuracy in terms of average silhouette width (ASW) by calculating the closeness of low-dimensional joint-features between spots within each predicted cell cluster (see Evaluation of clustering). Overall, we observed that the clustering accuracy of all models slightly increases with the proportion of labels for the training, and almost all models have a higher accuracy at the training with 70% labels.
Hence, we treated 70% as a cutoff to select labels for model training. Additionally, we found that (i) stMVC achieves higher and comparable performance than stMVC-M and SGATE-SLG; (ii) stMVC, stMVC-M, and SGATE-SLG perform better than that by two HSG-based models; (iii) SGATE-HSG performs better than SGATE-HSG-N; and (iv) SGATE-SLG performs better than semi-AE, showing that graph attention mechanism is responsible for capturing data structure ( Supplementary Fig.1). Overall, these results indicate the efficiency of stMVC.

Modeling gene expression data by autoencoder-based framework
Regarding gene expression data, we adopting our previous study modeled it as drawn from negative binomial (NB) distribution by an autoencoder-based framework 2 . Specifically, we learned it's -dimensional features through an encoder , and then transformed into the parameters of NB distribution by corresponding decoder ( % and & ): In this work, each neural network uses batch normalization, 'relu' is regarded as the activation function between two hidden layers, and the Adam optimizer with both a 1 AB weight decay and 8 AD learning rate is used to minimize the above loss function. In addition, we utilized the autoencoder structure (i.e., [N, 1000, 50, 1000, N]) to capture the inner structure of gene expression data. Here, for the data from Visium and STARmap, N is 2,000 and 1,020, respectively.

Learning representations from RNA-seq data by semi-AE model
To clarify if or not the graph attention mechanism is responsible for capturing the complex data structure, we further extended the usage of AE model described by Modeling gene expression data by autoencoder-based framework to do spot class prediction G = softmax(( (I) )) in a semi-supervised manner from region segmentation, and the loss function of which is summarized as follows: where is the number of labeled spots, is the number of classes, and N and N G are the label vector of spot N from the region segmentation and the prediction, respectively.
Taken together, the loss function of semi-AE model is summarized as: where is a parameter used to control the weight of two loss functions, and the default value is 90, at which semi-AE model achieves a better performance in our large-scale experiments.

Statistical model for testing genes enriched in different cell populations
We designed a Fisher's exact test-based measure to check whether or not two genes (or two